首页> 外文OA文献 >Deep Interactive Region Segmentation and Captioning
【2h】

Deep Interactive Region Segmentation and Captioning

机译:深度交互区域分割和字幕

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With recent innovations in dense image captioning, it is now possible todescribe every object of the scene with a caption while objects are determinedby bounding boxes. However, interpretation of such an output is not trivial dueto the existence of many overlapping bounding boxes. Furthermore, in currentcaptioning frameworks, the user is not able to involve personal preferences toexclude out of interest areas. In this paper, we propose a novel hybrid deeplearning architecture for interactive region segmentation and captioning wherethe user is able to specify an arbitrary region of the image that should beprocessed. To this end, a dedicated Fully Convolutional Network (FCN) namedLyncean FCN (LFCN) is trained using our special training data to isolate theUser Intention Region (UIR) as the output of an efficient segmentation. Inparallel, a dense image captioning model is utilized to provide a wide varietyof captions for that region. Then, the UIR will be explained with the captionof the best match bounding box. To the best of our knowledge, this is the firstwork that provides such a comprehensive output. Our experiments show thesuperiority of the proposed approach over state-of-the-art interactivesegmentation methods on several well-known datasets. In addition, replacementof the bounding boxes with the result of the interactive segmentation leads toa better understanding of the dense image captioning output as well as accuracyenhancement for the object detection in terms of Intersection over Union (IoU).
机译:随着密集图像字幕的最新创新,现在可以用字幕描述场景中的每个对象,而对象是由边界框确定的。但是,由于存在许多重叠的边界框,因此对此类输出的解释并非无关紧要。此外,在当前字幕框架中,用户不能涉及个人偏好以排除在兴趣区域之外。在本文中,我们提出了一种新颖的混合式深度学习架构,用于交互式区域分割和字幕制作,用户可以在其中指定应处理图像的任意区域。为此,使用我们的特殊训练数据来训练名为Lyncean FCN(LFCN)的专用全卷积网络(FCN),以隔离用户意图区域(UIR)作为有效分段的输出。并行地,使用密集图像字幕模型为该区域提供各种各样的字幕。然后,将用最佳匹配边界框的标题说明UIR。就我们所知,这是提供如此全面输出的第一笔工作。我们的实验表明,在一些知名数据集上,该方法优于最新的交互式细分方法。另外,用交互式分割的结果替换边界框可以更好地理解密集图像字幕输出,并可以更好地理解“联合交叉”(IoU)上的对象检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号